The policy gradient method enjoys the simplicity of the objective where the agent optimizes the cumulative reward directly. Moreover, in the continuous action domain, parameterized distribution of action distribution allows easy control of exploration, resulting from the variance of the representing distribution. Entropy can play an essential role in policy optimization by selecting the stochastic policy, which eventually helps better explore the environment in reinforcement learning (RL). However, the stochasticity often reduces as the training progresses; thus, the policy becomes less exploratory. Additionally, certain parametric distributions might only work for some environments and require extensive hyperparameter tuning. This paper aims to mitigate these issues. In particular, we propose an algorithm called Robust Policy Optimization (RPO), which leverages a perturbed distribution. We hypothesize that our method encourages high-entropy actions and provides a way to represent the action space better. We further provide empirical evidence to verify our hypothesis. We evaluated our methods on various continuous control tasks from DeepMind Control, OpenAI Gym, Pybullet, and IsaacGym. We observed that in many settings, RPO increases the policy entropy early in training and then maintains a certain level of entropy throughout the training period. Eventually, our agent RPO shows consistently improved performance compared to PPO and other techniques: entropy regularization, different distributions, and data augmentation. Furthermore, in several settings, our method stays robust in performance, while other baseline mechanisms fail to improve and even worsen the performance.
translated by 谷歌翻译
Generative models for learning combinatorial structures have transformative impacts in many applications. However, existing approaches fail to offer efficient and accurate learning results. Because of the highly intractable nature of the gradient estimation of the learning objective subject to combinatorial constraints. Existing gradient estimation methods would easily run into exponential time/memory space, or incur huge estimation errors due to improper approximation. We develop NEural Lovasz Sampler (Nelson), a neural network based on Lov\'asz Local Lemma (LLL). We show it guarantees to generate samples satisfying combinatorial constraints from the distribution of the constrained Markov Random Fields model (MRF) under certain conditions. We further present a fully differentiable contrastive-divergence-based learning framework on constrained MRF (Nelson-CD). Meanwhile, Nelson-CD being fully differentiable allows us to take advantage of the parallel computing power of GPUs, resulting in great efficiency. Experimental results on three real-world combinatorial problems reveal that Nelson learns to generate 100% valid structures. In comparison, baselines either time out on large-size data sets or fail to generate valid structures, whereas Nelson scales much better with problem size. In addition, Nelson outperforms baselines in various learning metrics, such as log-likelihood and MAP scores.
translated by 谷歌翻译
Approximately 1.25 million people in the United States are treated each year for burn injuries. Precise burn injury classification is an important aspect of the medical AI field. In this work, we propose an explainable human-in-the-loop framework for improving burn ultrasound classification models. Our framework leverages an explanation system based on the LIME classification explainer to corroborate and integrate a burn expert's knowledge -- suggesting new features and ensuring the validity of the model. Using this framework, we discover that B-mode ultrasound classifiers can be enhanced by supplying textural features. More specifically, we confirm that texture features based on the Gray Level Co-occurance Matrix (GLCM) of ultrasound frames can increase the accuracy of transfer learned burn depth classifiers. We test our hypothesis on real data from porcine subjects. We show improvements in the accuracy of burn depth classification -- from ~88% to ~94% -- once modified according to our framework.
translated by 谷歌翻译
深度强化学习(RL)代理通常过于适应训练环境,从而导致概括性能差。在本文中,我们提出了思想家,这是一种自举方法,以无监督的方式从观察结果中消除混杂特征的对抗性效应,从而改善了RL代理的概括。思想家第一集群将轨迹经历了几个群集。然后,通过应用样式传输生成器来启动这些轨迹,这将轨迹从一个群集的样式转换为另一个群集的样式,同时保持观测值的内容。然后将自举轨迹用于政策学习。思想家在许多RL设置中具有广泛的适用性。实验结果表明,与基础算法和几种数据增强技术相比,思想家可以在Procgen基准环境中提高概括能力。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and ({even more importantly}) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies.
translated by 谷歌翻译
Improving the visual quality of the given degraded observation by correcting exposure level is a fundamental task in the computer vision community. Existing works commonly lack adaptability towards unknown scenes because of the data-driven patterns (deep networks) and limited regularization (traditional optimization), and they usually need time-consuming inference. These two points heavily limit their practicability. In this paper, we establish a Practical Exposure Corrector (PEC) that assembles the characteristics of efficiency and performance. To be concrete, we rethink the exposure correction to provide a linear solution with exposure-sensitive compensation. Around generating the compensation, we introduce an exposure adversarial function as the key engine to fully extract valuable information from the observation. By applying the defined function, we construct a segmented shrinkage iterative scheme to generate the desired compensation. Its shrinkage nature supplies powerful support for algorithmic stability and robustness. Extensive experimental evaluations fully reveal the superiority of our proposed PEC. The code is available at https://rsliu.tech/PEC.
translated by 谷歌翻译
Sparse principal component analysis (SPCA) has been widely used for dimensionality reduction and feature extraction in high-dimensional data analysis. Despite there are many methodological and theoretical developments in the past two decades, the theoretical guarantees of the popular SPCA algorithm proposed by Zou, Hastie & Tibshirani (2006) based on the elastic net are still unknown. We aim to close this important theoretical gap in this paper. We first revisit the SPCA algorithm of Zou et al. (2006) and present our implementation. Also, we study a computationally more efficient variant of the SPCA algorithm in Zou et al. (2006) that can be considered as the limiting case of SPCA. We provide the guarantees of convergence to a stationary point for both algorithms. We prove that, under a sparse spiked covariance model, both algorithms can recover the principal subspace consistently under mild regularity conditions. We show that their estimation error bounds match the best available bounds of existing works or the minimax rates up to some logarithmic factors. Moreover, we demonstrate the numerical performance of both algorithms in simulation studies.
translated by 谷歌翻译